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ABSTRACT 

A three dimensional parallel Monte Carlo (MC) dust radiative transfer code is presented. To 
overcome the huge computing time requirements of MC treatments, the computational power of vec- 
torized hardware is used, utilizing either multi-core computer power or graphics processing units. 
The approach is a self-consistent way to solve the radiative transfer equation in arbitrary dust con- 
figurations. The code calculates the equilibrium temperatures of two populations of large grains and 
stochastic heated polycyclic aromatic hydrocarbons (PAH). Anisotropic scattering is treated apply- 
ing the Heney-Greenstein phase function. The spectral energy distribution (SED) of the object is 
derived at low spatial resolution by a photon counting procedure and at high spatial resolution by a 
vectorized ray-tracer. The latter allows computation of high signal-to-noise images of the objects at 
any frequencies and arbitrary viewing angles. We test the robustness of our approach against other 
radiative transfer codes. The SED and dust temperatures of one and two dimensional benchmarks are 
reproduced at high precision. The parallelization capability of various MC algorithms is analyzed and 
included in our treatment. We utilize the Lucy-algorithm for the optical thin case where the Poisson 
noise is high, the iteration free Bjorkman & Wood method to reduce the calculation time, and the 
Fleck & Canfield diffusion approximation for extreme optical thick cells. The code is applied to model 
the appearance of active galactic nuclei (AGN) at optical and infrared wavelengths. The AGN torus 
is clumpy and includes fluffy composite grains of various sizes made-up of silicates and carbon. The 
dependence of the SED on the number of clumps in the torus and the viewing angle is studied. The 
appearance of the 10/im silicate features in absorption or emission is discussed. The SED of the radio 
loud quasar 3C 249.1 is fit by the AGN model and a cirrus component to account for the far infrared 
emission. 

Subject headings: Radiative transfer - Methods: numerical - dust, extinction - Infrared: general - 
Galaxies: active - quasars: individual: 3C249.1 



1. INTRODUCTION 

Dust obscured objects cannot be studied directly in 
the UV/optical, since the dust shields most of the vis- 
ible light. To derive the UV/optical component or 
to constrain the morphological structure from available 
near/far infrared data, a detailed model of the interaction 
of photons with the dust is required. This necessitates 
a solution to the radiative transfer (RT) equation. Ana- 
lytically, this can only be done in some simple configura- 
tions, for example by assuming spherical or disk symme- 
try in which either scattering or absorption is neglected 
and the wavelength dependency of the dust cross section 
is strongly simplified, e.g. by a gray body approxima- 
tion. Nature, however, is usually not well approximated 
by such assumptions and numerical modeling is the only 
way to solve this problem. 

Dust is detected in the majority of active galactic 
nuclei ( Ilaas ct al. 200H)- Acc ording to the unified 
scheme fAntonucci fc Milieilll985f ). AGN are surrounded 
by a dust obscuriiig toru s. This torus, as argued by 
iKrolik fc Begelmaiil ()1988l ). needs a clumpy structure to 



allow the survival of dust grains in regions where the 
gas temperatures (~ 10^ K) are extreme compared to the 
dust sublimation temperature. Indeed, Tristram et al.l 
(|2007D show with VLTI observations of the Circinus ac- 
tive galactic nuclei (AGN) strong evidence for a clumpy 
or filamentary structure of the nucleus. 

In this paper a numerical method is presented, which 
solves the radiative transfer equation in a thre e di- 
mens ional geometry, based on the MC technique (Witt| 
119771 : Sect 12]) ■ To reduce the required computa- 
tional efforL_dlfferent_0£timizat^^ 
oped (lLucvl[l99llBiorkman fc Wood"2001': 'Gordon et al.| 



2001 



20081: 



Misselt et al.ll2001l: [W)h 2003; Bacs 2008; Bianch 
Baes et al.l 120 lit Sect. [S]). Our numerical solution 
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of the radiative transfer equation takes advantage of some 
of these optimization algorithms and is specifically de- 
veloped to be vectorized and to run on graphics process- 
ing units (GPU). They are intro duced into the original 
code developed bv i Kriigell ()2008D . The MC routine han- 
dles arbitrary dust distributions in a three dimensional 
Cartesian model space at various optical depths. The 
self-consistent solution provides the dust temperatures; 
spectral energy distributions (SED) and images are com- 




Figure 1. 2D illustration of the three dimensional grid and the 
trajectories of the photon packets. The grid is divided into cubes, 
which can be further divided into sub-cubes. The source emits 
photons at (i), which interact or not with the dust. By interaction 
photon packets are either absorbed (ii) or scattered (iii). Multiple 
interactions may occur in one cell (iv) . After absorption the packet 
will be re-emitted by the dust at a different frequency. Frequency 
changes are illustrated by different line styles. Finally, (v) the 
packet escapes the model space. 

puted using a ray-tracer (Sect. |4]). The code is tested 
against existing benchmark resuhs (Sect. [S]). The influ- 
ence of dumpiness of an AGN dust torus on the SED 
and the 10/im sihcate feature is discussed and a model 
of the radio loud quasar 3C249.1 is presented (Sect. ^. 



2. MONTE CARLO RADIATIVE TRANSFER 



In 



our MC procedure the bolometric luminosity 
L ~ L L^dv of the central engine is divided into 
N = m rizyk monochromatic photon packets of equal en- 
ergy ^ = L/N, where m is the total number of frequency 
bins of the heating source and n^yk counts the number 
of photon packets per bin. The width of the frequency 
bin with index j between the frequencies Vj and h'j+i, 
Av = {h'j+i — Vj\, is derived from the spectral shape of 
the radiation source and the energy of a photon packet 
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The primary heating source emits photons for exam- 
ple in the optical and UV while the dust emission occurs 
at IR and sub-millimeter wavelength. Two frequency 
grids are considered: The first one is set up by Eq. ((J) 
and accounts for the emission of the primary heating 
source. The second frequency grid is an extended ver- 
sion of the first one and includes additional bins in the 
IR and sub-millimeter to scope with the dust emission. 
The interaction probability between photons and dust 
particles in the radiative transfer problem is computed 
by the absorption k"''" and scattering k^'^" cross-sections. 
The code includes different grain materials and size dis- 
tributions. 



The model space is set up as cubes in an orthogonal 
Cartesian grid. Each cube is divided into an arbitrary 
number of sub-cubes of volume Vi and constant density 
Pi (Fig. [1]), where the index refers to sub-cube i. The 
subdivision of cubes allows a finer sampling whenever re- 
quired: For example close to the dust evaporation zone, 
in regions of high optical depth or where the spatial gra- 
dient of the radiation field is large. One possible trajec- 
tory of a photon is illustrated in Fig. [T] Photons are 
emitted by the source at frequency Vj^ , where j^, is the 
index of the frequency grid of the source. The fiight path 
of the photon through out the model space is computed. 

The direction of the photon is chosen from a uniform 
distribution of random numbers Ci , C2 from the half open 
interval [0, 1), so that (j) = 27rCi and cos6' = 1 — 2^2- The 
distance from the entry point of a photon into a cell i 
to its exit point along the travel direction is £i. In the 
MC method the interaction of photons with the dust in a 
sub-cube can be d etermined with an unif orm distributed 
random number C (|Wittlll977tlLucvlll9990 and the optical 
depth 



nii^) ^ i^f^ + K^) p, i. 



(2) 



Photons leave the cell if tj(z^) < — logC and otherwise 
they interact with the dust. If a packet passes through 
a cell without interacting, it enters a neighboring cell 
according to its direction of travel. At the border of 
the model space it eventually escapes. When a photon 
packet enters a cell a new random number is chosen to 
determine the interaction probability of photons with the 
dust Q. The photon packet interacts with the dust after 
it has traveled a distance 
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The travel distance £'^ defines the point in the cell where 
interaction takes place and photons are either scattered 
(Fig. [TJii) or absorbed (Fig. [TJiii). Depending on its 
optical depth multiple scattering or absorption events 
may occur in a cell (Fig. [Uiv). In case of an interaction 
event the probability of scattering is given by the albedo 
Ai, = Kff"' I {nff"- + k"*"*) and therefore the chance of 
absorption is 1 — A^. 

When a photon is scattered on a dust grain the packet 
keeps its frequency, but changes its travel direction ac- 
cording to the phase f u nction of the particle. We use the 
IHenvev fc GreensteinI ()1941D phase function to approx- 
imate the anisotropic scattering (see Sect. H]). In the 
isotropic case the asymmetry factor g^, equals 0. The in- 
dex of the cell, the frequency of the scattered photon and 
the incoming direction is stored. This allows computing 
of the source function of the cell i, which will be used in 
a ray-tracer developed to compute the observed image 
at any frequency (Sect. 2]). 

When a photon is absorbed a new one with different 
frequency and direction is emitted from that position. 
The frequency of the emitted photon packet is given by 
the temperature of the absorbing material. The emitted 

^ It can be confirmed that the interaction probability calculated 
with a new random number is in agreement with the treatment of 
a trav el distance which is determined by a single random number 



photon has the same energy as the absorbed one. All 
photon packets contain the same amount of energy ^, 
therefore the total number of absorbed photon packets 
per particle and cell is used to calculate the temperature 
of each material. After k absorptions of photon packets 
by dust the cell i has a temperature Ti^k calculated from: 



j Kf'B,{T,^k)dv 
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(4) 



where B^{Ti,k) is the Planck function. 

Dust emits a photon packet at the shortest frequency 
u' calculated with: 



C' Bu{T,^k) diy > C «r S.m,fc) diy . (5) 

Jo 

In the MC method one follows all N photon packets 
through the dust cloud until they reach the outer bound- 
ary of the model and escape. For each cell the absorp- 
tion events are counted. When all packets escaped the 
model space, the number of absorptions in a cell is used 
(Eq. (j4|)) to calculate its dust temperature. An itera- 
tive MC method is started with an initial guess of the 
dust temperatures. This allows computing of the fre- 
quencies of packets emitted by the dust (Eq. (O). After 
all packets escape the model space another N photons 
are emitted by the source and their flight paths through 
the model are computed. However, this time using the 
dust temperatures of the previous run. This procedure 
continues until the dust temperatures converges, which 
usually takes about 3-5 iterations. 

2.1. Non equilibrium radiative transfer (PAH) 

The enthalpy state of very s mall grains su ch as poly- 
cyclic aromatic hydrocarbons (jTielensI 120081 ) fluctuates 
strongly when they are hit by photons. In order to solve 
the radiative transfer problem including PAH with MC 
we treat their emission as a stochastic process. A temper- 
ature distribution function P{T) of the PAH is computed 
using the iterative scheme by Siebenmorgen et al. (1992). 
PAH are implemented into the MC code in two steps: 
first, we compute the PAH absorption of photon packages 
and store position and frequency of these events. Then, 
after calculating the equilibrium temperature of the large 
grains (Sect. [5]), the PAH emission is computed. Each 
cell with PAH absorption becomes a source and emits 
the same number of photon packages as previously ab- 
sorbed by the PAH. These photons are traced through 
the model as described in Sect. [H The frequency of the 
emitted package is calculated according to: 



en{v)dv >i I en{v)dv 



with random number ^ and PAH emissivity 



A^)^K, 



PAH 



B,{T)Pn{T)dT 



(6) 



(7) 



where K^^^ is the PAH cross-section per unit mass of 
dust and P„(T ) is the temperature distr i bution function 
as described in iSiebenmorgen fc Kriigell ()2010D . 

Similar to the evaporation of large grains it is 
important to compute the location where PAH are 



able to survive the radiation environment. We 
treat the photo-dissociation of PAH as discussed in 
ISiebenmorgen fc Kriigell (|2010[ ). They find that PAH dis- 
sociate when, within a cooling time, the total absorbed 
energy -EpAH is larger than a critical energy E'crit '■ 



EpAH > E„it — 



Nr 



(eV), 



(8) 



where Nq is the number of carbon atoms per PAH 
molecule. In cells where the molecules dissociate the ab- 
sorbed photon packages are treated as if they are emitted 
from the star. Further details on our and other numeri- 
cal implementations of PAH in the MC code, a verifica- 
tion against benchmark models and an application to the 
PAH emission from p roto-planetary disks is g iven in an 
accompanying paper (jSiebenmorgen fc Hevmann 2012 ). 

3. OPTIMIZED MONTE CARLO RADIATIVE TRANSFER 

In this section we describe various optimization tech- 
niques which are introduced into our MC scheme to in- 
crease the accuracy and decrease computing times. 

3.1. Small optical depth 

The MC approach solves the radiative transfer prob- 
lem in a probabilistic way. Dust temperatures become 
uncertain and have large errors whenever the interac- 
tion probability is low. This is the case in cells of low 
optical depth where many photons fly-by without be- 
ing absorbed or scattered by the dust. The statistical 
noise in such cells can be re duced signiflcantly apply- 
ing a procedure developed by iLucvl (|l999l) . Contrary to 
the method described in Sect. JH in which only absorp- 
tion events contribute to the temperature calculation of 
the cell, the Lucy method considers the absorbed energy 
of every photon packet traveling through the cell. This 
technique increases the temperature accuracy of the MC 
treatment in low optical depth regions. It has no ad- 
vantage when the optical depth r^ > 1 as interactions 
between photons and dust are like ly. 

We follow Sect. 3. 4 of lLucvl (|1999l ) and implement a sim- 
ilar optimization technique. It is applied whenever the 
maximal optical depth of a cell i is smaller than 10~^. 
The performance of the MC scheme depends critically 
on this value. It is chosen so that the performance of 
the code is high while still accounting for reasonable ac- 
curacy. For smaller values the optimization is only ap- 
plied for extreme optical thin cells whereas for higher 
values the expensive accuracy enhancement is applied to 
cells which can be accurately computed without the Lucy 
scheme. With our choice the energy absorbed by the dust 
^abs jg ^ small fraction of the energy of a photon packet. 
The energy is computed by 



£:f''" = (i-e-^0^ 



(9) 



where tj is the optical depth from the entry to the exit 
point of the photon of cell i. Thereby the frequency of 
the photon packet remains unchanged. 

3.2. High optical depth 

The MC solution of the radiative transfer equation 
becomes slow for very optical thick regions, where the 
number of photon interactions with the dust increases 



exponentially. To avoid photons becoming trapped in 
cells with very high optical depth we apply a modified 
random walk procedur e. It is based o n a diffusion ap- 
proximation by Fleck fc Canfieldl (|1984[ ). The method is 
tested bv IMin e t al. (2009) and numerical enhancements 
are given by iRobitaille (2010) . 



3.3. Iteration-free Monte Carlo 
An iteration free MC scheme is de veloped by 



Bjprkman & Wood! (pOOl (B&W); see iBaes et al.l 
^05); kriigcl (200l) for helpful comments. In the B&W 
method the dust temperature in a cell is adjusted with 
respect to the previous absorption event. Contrary to 
Eq. (O , dust re-emission occurs at the shortest frequency 



for which 



dT - ^ 



" dT ^ ' 



is valid. Both methods described in Eq. ([SJ and 
Eq. (fTO|) are implemented in the MC code and can be 
used upon preference. 

3.4. Parallelization on graphics processing units 

The MC method of solving the radiative transfer prob- 
lem is particularly suited for parallelization be cause all 
photon packets are independent of each other ([Jonssonl 
[2001 iHevmaimI IMot Isiebenmorgen et al.ll201lD . Vec- 
torization of the code allows a number of photon packets 
to be emitted simultaneously. The number of packets 
launched at a time depends on the number of threadfQ 
available on the computer. The trajectories of the pho- 
tons can then be calculated in parallel. 

In parallel environments an additional requirement is 
placed on the random number generator; each thread 
must have an independent sequence of random num- 
bers. We solve this problem by applying a parallel 
version of the Mersenne Twist er algorithm as given by 
iMatsumoto fc Nishimural (pOOO. 

Parallelization is most efficient when all processing 
units finish their task at about the same time. Then 
vectorization speeds up the code roughly equal to the 
number of threads available. Unfortunately this is not 
always the case. The number of interactions of pho- 
tons increases exponentially with the optical depth ry 
and photon packets may get trapped in cells with say 
TV > 1000. The workload of these cells is much higher 
than for cells at much lower optical depth. This results 
in a rather unbalanced workload over all threads so that 
the advantage of vectorization may disappear. In our 
application idle threads are avoided as much as possi- 
ble. Every time a thread finishes the trajectory of the 
photon packet within the model space a counter is in- 
creased. If the counter reaches 80% of the total number 
of parallel working threads and if the number of inter- 
actions of the photon packet within a cell is larger than 
100 the thread pauses. In this case the position and fre- 
quency of the photon packet is stored. Close to the end of 
the simulation when all photons with average processing 

•^ A thread is the smallest unit of processing that can be sched- 
uled in a parallel environment. 



speed have escaped the model space, the stored photons 
are resumed. This procedure allows a balanced workload 
among all th reads. I n addition the modified random walk 
procedure of lRobita illc (2010) is implemented (Sect. [321) 
This code utilizes two different parallelization meth- 
ods. It can be used on shared memory machines using 
the openMP library to run as many parallel rays as there 
are processor cores available and on vectorized hardware 
(GPU) highly optimized for parallelization. The com- 
plete MC radiative transfer solution is ported to GPU 
using the Compute Unifi ed Device Arc hitecture (CUDA) 
developed by NVIDIA (jCUDAI [20Tl[) . This speeds up 
the entire MC solut i on in comparison to the method by 
iJonsson fc PrimacH ()2010l ). which provides only a GPU 
acceleration when computing the temperature of a grain. 

3.5. Numerical recipes 

For vectorized MC radiative transfer codes it is rec- 
ommended to follow some general recipes. In vector- 
ized systems the increased number of processing units 
decreases the computing time for numerical operations 
whereas the time needed for read-and-write operations 
remains unaltered. This implies a paradigm shift in the 
programming strategy. In single processing codes the 
performance is often improved by reading pre-calculated 
values from memory. Instead in a vectorized code at 
some point it is faster to calculate such values case by 
case. For example in our code, and as long as the dust 
frequency grid has less than 300 bins, it is faster to solve 
Eq. ([5]) on the fiy than reading pre-computed values 
from memory. 

The performance of parallel codes is increased consid- 
erably when designing a particular array structure for 
the particular hardware in use. A read operation from 
memory is most efficient when it accesses the entire array. 
For the CPUs used in this paper the memory bandwidth 
is 512 bit and we use a 32 bit machine. Therefore in our 
code the most efficient way to read from memory is by a 
modulus of 16 x 32 bit values. The design of a particular 
array structure is import for problems where the memory 
access is not predictable, which is unfortunately the case 
in Monte Carlo schemes. 

In order to improve the performance it is very efficient 
to code with as less interference between workers and 
threads as possible. The code should avoid situations 
where thread A waits until thread B has finished and 
then thread B waits until thread A has finished. Un- 
fortunately in MC simulations the dependency between 
workers and threads is often less obvious than in the pre- 
vious example. 

Another challenge with memory interactions is to use 
local copies of arrays for each thread as often as possible. 
This decreases the interaction between different threads 
and usually leads to large performance improvements, of 
course at the expense of some additional memory needs. 
When dealing with memory interactions between threads 
it is helpful to study the special hardware capabilities. 
In our case we gain a factor 2 in the computing time 
by using atomic integer operations rather than floating 
point operations. 

It is of advantage to optimize all read-and-write oper- 
ations to the fastest available source. This can be done 
by caching small amounts of data from the disk to the 
global memory, than to the GPU memory and finally 



Table 1 

Run time requirements of different MC methods. 



Method 


Threads 


Time 
(min.) 


Lucy 


1 


180 


B&W 


1 


60 


Lucy 


8 


45 


B&W 


8 


20 


GPU 


256 


3 


GPU+Lucy 


256 


4 


GPU+B&W 


256 


2 


GPU+B&W+Lucy 


256 


2 



to the GPU shared memory. For example in our code 
we store the wavelength dependent cross-sections in the 
GPU shared memory. This is about 10 times faster than 
reading them from the global memory. 

For MC simulations the random number generator 
(RNG) is important and special care must be taken on 
parallel systems. A detailed study of this topic is beyond 
the scope of this paper. We tested different RNG avail- 
able and find that the performance and accuracy as well 
as thread dependence differs significantly between vari- 
ous RNG implementations. It is therefor important to 
choose and test a RNG which fits the particular problem 
best. 

3.6. Parallelization and optimization algorithms 

The run time requirements of the different MC meth- 
ods are compared. Wc consider a star at temperature of 
2500 K with solar luminosity which heats a spherical and 
constant density dust envelope with an inner radius of 
0.7 AU and an outer radius of 700 AU. The optical depth 
measured from the star to the outer boundary is ry = 10. 
Parameters of the dust are specified as in Sect. 15.11 In 
the models 10^ photon packets are emitted from the star 
and 10^ grid cells are used. 

We apply the Lucy (Sect.3.1), B&W (Sect.3.3.) 
method in a scalar (single thread) , a CPU version with 8 
threads, a GPU version with 256 threads and in addition 
the iterative MC scheme of Sect. [5] on a GPU machine. 
Initializing times of the various MC methods are identi- 
cal. 

As described in Sect.3.1 for the same number of emit- 
ted photon packets the Lucy method provides a better 
temperature estimate than the other methods. In the 
Lucy method the fraction of a photon packet is con- 
sidered. This can only be realized by considering float- 
ing point operations which are more computer expensive 
than integer operations. In our test case the scalar ver- 
sion of the Lucy method is run with three iterations and 
requires a total run time of 3 hours (Tab. [T]). 

For single thread applications the iteration-free MC 
scheme by B&W (Sect. 13. 3p has the advantages that it 
speeds up the process by a factor which equals the num- 
ber of iterations required for convergence in the other 
MC treatments. However, in the B&W method mem- 
ory interaction between different cells are unavoidable 
and they slow down the parallelization capabilities of 
this method. The memory interaction between cells rises 
with the number of threads and therefore the amount 



of necessary atomic memory operation^ increases with 
parallelization. To minimize the number of atomic mem- 
ory operations we solve the problem in an iterative MC 
scheme using Eq. (|4]). This allows parallelization with 
a huge number of threads. At low budget this can be 
realized using GPU technology and these MC meth- 
ods are labeled as such in column 1 of Table 1. The 
GPU method may be further optimized in combina- 
tion with the Lucy (GPU-|-Lucy) or the B&W method 
(GPU-I-B&W). On our conventional computer we use a 
GPU with 256 threads (8 multiprocessor each with 32 
cores) clocked at 1.5 GHz. When compared to a single 
thread CPU application clocked at 3 GHz a speed up fac- 
tor due to vectorization of 60 is realized. This is below 
a theoretical expected speed up factor of 128 because of 
additional overheads produced by input/output routines 
and memory transfers in the GPU machine. 

Total run times of the various methods with different 
numbers of threads are given in Table [T] The vector- 
ized Lucy method scales slightly better with the number 
of available threads than the other algorithms because 
fewer atomic memory operations are required. The speed 
increase by vectorization as compared to scalar MC treat- 
ments is proportional to the number of available threads. 
The GPU method with B&W optimization is a factor 90 
faster than the original Lucy method. 

4. RAY-TRACER TO COMPUTE SEDS AND IMAGES 

Photons which eventually escape the MC model space 
in a particular solid angle can be counted and converted 
into a flux density. In principle this method allows com- 
puting images of the object. Unfortunately to obtain 
a moderate signal-to-noise ratio the number of photon 
counts per solid angle must be high which is difficult 
to reach within reasonable computing times. Here we 
present a ray-tracing method which allows computing 
of noise free images, SEDs and visibilities at high spa- 
tial resolution. The ray-tracer uses the temperature and 
scattering events of the cells calculated with the MC code 
(Sect. [5]). The uncertainty in the derived images is there- 
fore based on the precision of the MC computation. 

In the algorithm rays are traced through the model 
space from an observer plane of arbitrary orientation. 
The ray tracer together with the MC method allows us 
to calculate the flux received on each pixel of an im- 
age in the plane of the observer. The observed image 
is located at distance D from the object which is at the 
center of the cube. The orientation of the image plane 
is defined by its surface normal ez . The axis of the 
image plane Cx , Cy is perpendicular to Cz ■ The coor- 
dinates {x, y, z) of the 3D model space are transformed 
by a parallel projection into the coordinates {x',y') of 
the 2D image in the observers plane. The image consists 
of Ux X ny pixels, with a pixel size chosen so that the 
complete model fits in the projection. The center of the 
image ig is located at position [xo,?/o,2o] and has image 
coordinates [x[),y'Q]. The projected coordinates [x',y'] of 
the image are transformed into MC cube coordinates by 

^ Atomic operations are operations which are performed without 
interference from any other threads. Atomic operations are often 
used to prevent conditions which are common problems in multi- 
thread applications. 



r{x', y') ^io + {x' - x'q) ej + (y' - y'^) By , (11) 

where r — [x,y,z]. The ray-tracer follows the line of 
sight from each detector pixel in direction e^ through 
the MC model. Adding the contribution of emission and 
scattering from each cell i along the line of sight results 
in the observed intensity. The contribution of cell i to 
the total intensity is 



^.. = (^..+^,.)e- 



(12) 



where I^ ^ refers to the intensity of the dust emission and 
/* j to scattering; r^^i is the optical depth at frequency 
v of cell i. For convenience we drop the index i in the 
following. The optical depth is 



K 



abs 



p (r) ds 



(13) 



where K'^^^ is the absorption cross section per unit 
dust mass and I is the path lengths. We call the optical 
depth measured along the ray from the border of the cell 
to the observer r^, and the one measured face-to-face 
from one side of the cell to the other t^''" . The emission 
is 

/,^-[l-exp(-T-")] B,(T). (14) 

For scattering we implement eithe r a g-factor ap- 
proximation following the notation of iKriigell ()2008f ) or 
consider non-isotropic scattering applying the Henyey- 
Gree nstein phase function Phg (|Henvev fc GreensteinI 
[1941 defined as: 



^hg(/3) 



(1-5^) 



4^ [l+g^^2gcos{P)\ 



3/2' 



(15) 



where g is the anisotropy parameter, li g — the scat- 
tering is isotropic. The scattered light intensity is given 

by 

^i. = 2^ PiiGiP) -^ , (16) 

where Av is the width of the frequency bin, N{(3, !/)'*=" 
is the total number of scattering events with frequency v 
and angle /3 between the observer and the original scat- 
tering direction. This method to calculate the scattered 
int ensity is similar to the peel-ofF technique proposed 
bv lYusef-Zadeh et al.l ()1984D . The difference lies in the 
time of the calculation of the scattering intensity. The 
peel-off technique calculates the intensity during the MC 
run. Whenever a scattering event occurs the scattered 
intensity of the event is added to the observed images. 
Therefore in the peel-ofF technique the location of the 
observer has to be known before the MC run. This in- 
formation is not required in our MC scheme which stores 
only the scattering events. Then, in post-processing us- 
ing the ray-tracer, the contribution of the scattered light 
flux can be computed for arbitrary observer orientations. 
The flux density is calculated summing up the contribu- 
tion of each cell given by 



F„ 



Ar, 



D^ 



\lt 



exp(- 



(17) 



where t^, is the optical depth from the border of the cell 
to the observer and Apix is the area of the detector pixel. 
The ray-tracer calculates emission and scattering images 
at high signal-to-noise. It cannot remove the noise which 
is introduced by the MC simulation where the temper- 
ature distribution and scattering events are calculated 
with a stochastic sampling. However, for a known dis- 
tribution of the temperature and scattering events the 
ray-tracer provides noise free images. 

5. BENCHMARK 

The vectorize d MC method i s tested against the Id ray 
tracing code bv IKriigell ()2008[ ). Further verifications of 
the MC and comparison with benchmark tests are de- 
scribed below 



5.1. Spherical symmetry (ID) 

Ben chmark models in ID are provided bv llvezic et al.l 
()1997() . They consider a spherical dust envelope which is 
centrally heated by a 2500 K star with luminosity of \Lq. 
For this test the dust absorption and scattering coeffi- 
cient for wavelength below Aq ~ l/im are gabs = gsca — 1 
and are q^ca. = Aq/A and gabs = (Ao/A)"'' at longer 
wavelengths. Models are computed at optical depths of 
TV — 1, 10, 100 and 1000 with inner radius nn = 3.14, 
3.15, 3.20 and 3.52 AU, respectively. The dust density 
is constant and the outer radius is Tout = 1000 x rjn. 
The models are run on a grid with 10^ cells and 2 x 10* 
photon packets per iteration are launched from the star. 
The number of MC iterations are 3, 4 and 5 for ry < 10, 
100 and 1000, respectively. 

MC computed temperat ures are compared to the 
one calculated by DUSTY flve zic fc Elitzudll997t ) and 
Kriigcl (2008); they agree within <1%. The SEDs are 
shown in Fig. [2] in which each quadrant represents mod- 
els at different optical depths. In the quadrants the SED 
are shown in the top and the flux difference between MC 
and benchmark in the bottom panel. The SED of the 
MC models are either computed by photon counting or 
using the ray-tracer (Sect.|4]). The difference of the SED 
computed by MC and DUSTY is typically better than a 
few %. It becomes larger at faint fluxes where the ray 
tracing is more accurate than the packet counting proce- 
dure because of photon noise. For the ry = 1000 model 
the counting method is starved at fluxes below 0.2% from 
the peak flux. We encountered numerical inconsistencies 
in the DUSTY code at flux levels <10~* from the peak 
when compar ed against the radiative transfer code by 
IKrugell ((2008h . 

The differences between the reference codes and our 
MC program is mainly caused by Poisson noise. The 
photon noise is large at flux levels which are low com- 
pared to the peak flux. The wavelength range where the 
frequency grid of the source and that of the dust overlap 
is an additional source of systematic error. The source 
grid is build so that the energy of the photon packets of 
each frequency bin are identical. This results in a larger 
bin size at the Rayleigh- Jeans tail of the star where, on 
the other hand, the dust grid has a fine sampling because 
of the silicate and PAH features. To reduce this sampling 
problem it is necessary to increase the number of fre- 
quency bins of the star. Unfortunately, the total number 
of photons emitted from the star must be increased to 
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Figure 2. Comparison of SEDs of dust splieres at opt ical depths b etween ry = 1 and 1000 computed witli MC and ray— tracer (green) or 
photon counting (dashed), benchmark results in black dlvezic et al.l (l997i) '). Lower panels give the difference between benchmark and the 
procedures of this work. 



achieve a certain signal-to-noise for each bin. The num- 
ber of photons emitted from the star in each frequency 
bin must be large enough to reduce the Poisson noise to 
acceptable levels. We use 1000 frequency bins for the star 
and emit in each bin 10^ photons. This choice is based on 
a run time optimization for the achieved accuracy. The 
third source of systematic differences between the refer- 
ence codes and our MC scheme is the spatial grid. While 
the MC code uses a Cartesian grid in three dimensions 
with one level of refinement th e reference codes and in 
particular that of lKriigel ()2008l ) , uses an extremely opti- 
mized grid to account for the spherical symmetry. This 
systematic effect could be reduced by an increase of the 
number of MC grid cells. Current hardware limits the 
number of grid cells because of memory and speed con- 
straints. 

5.2. Disk geometry (2D) 

Different methods to com pute the RT in a 2D du st con- 
figuration are compared bv lPascucci et al.l ()2004[ ). They 
consider a star which heats a dust disk. The inner region 
r < Tin is dust free and the dust density distribu t ion is 
similar to those described bv lChiang fc GoldreichI ()1997l 
[T999.) : 



Table 2 

Parameters of 2D benchmark by IPascucci et al.l | |200'4I) 



Symbol Parameter 



Value 



M* 


Stellar Mass 


1 M0 




R* 


Stellar Radius 


IRq 




T* 


Stellar effective temperature 


5800K 




^in 


Inner disk radius 


1 AU 




r-out 


Outer disk radius 


1000 AU 




rd 


Half of outer radius 


500 AU 




2d 


Disk height 


125 AU 




a 


Grain radius 


0.12 /xm 




PO 


Density for ry = 1 


8.45 10-22 gcm" 


-3 


Tv 


Optical depth at 550 nm 


1, 10, 100 






p(r,.) = Po(^)e-(°-^ 


Tiz/hf-) 


(18) 


ith 


/^ = z.(-)^■^^^ 




(19) 




Td 







where r is the distance from the central star in the mid- 
plane of the disc and z is the height above the midplane. 
The parameter po is chosen such that the optical depth at 
550 nm along the midplane is 1, 10 and 100, which leads 
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Figure 3. Comparison of SEDs of a dust disk at optical depths along the midplane of ry = 100, face-on view (12.5°) is shown left and 
edge— on view (77.5°) right. SEDs are com puted with MC and ray-tracer (green) or photon counting (dashed). The mean SED of the 
benchmark results bv lPascucci et aLl 120041) is in black. Lower panels give the difference between the mean SED of the benchmark and the 
procedures of this work. 

by us and the mean SED of the benchmarks is typicahy 
a few % and larger deviation are found at fluxes below 
1% of the peak flux. This is visible at the border of 
the wavelength grid and in the lO/ini silicate absorption 
band where variations from one SED to another in the 
reference codes are also larger. 

The differences in the two dimensional benchmark and 
our code are twofold. There is the Poisson noise which is 
present in our as well as in the r e ferenc e. Note that the 
reference SED in iPascucci et al.l ()2004[) is computed as 
an average of various SEDs computed by different codes. 
The second source of systematic error is the grid. We 
use a three dimensional Cartesian grid with one level of 
refinement. The reference codes calculate the benchmark 
in a two dimensional grid optimized to account for the 
disc symmetry. It is possible to improve our solution by 
increasing the number of grid cells as well as the number 
of photon packages. However, the current hardware gives 
an upper limit to run time and memory requirements. 



to densities po = 8.45 10-^2^ 8.45 IQ-^i and 8.45 10~ 
gcm"'^. Additional parameters are specifled in Table [51 
The absorption and scattering cross sections are that of 
a 0.12/im sili c ate g rain (optical dat£0 are taken from 
IDraine fc Le^ (flQSl ). 

The SEDs calcu lated by the various algorithms and 
treatments used bv lPascucci et al.l ()2004[ ) agree to better 
than 20%. The RT in the disk is solved by our GPU code 
and the derived SED a re compared to an av eraged SED 
of the results given by iPascucci et al.l (|2004D 

In the MC program we use 3 x 10^ cells and run models 
with 2 X 10* photon packets per iteration. 

For the optical depth along the midplane of the disk 
with TV < 10 we use 3 iterations and found in the SED 
an overall agreement to the benchmark results to within 
a few %. In the ry = 100 case 4 iterations are used 
and the SED and differences to a mean SED over those 
computed in the reference models is shown in Fig. [3l 
Two different inclinations 12.5° (face-on), 77.5° (edge- 
on) are displayed. The residual between SED computed 
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Th e unified scheme of A GN (jAntonucci fc Milled 
119851 [Padovani fc Urrv. 119901) predict that the central 
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Figure 4. Spectral energy distribution of clumpy AGN torus mod- 
els at viewing angles between 5° and 85° and number of clouds of 
■'Vclump = 500, 1000 and 2000. 

engine is surrounded by dust so that the obscura- 
tion depends on the viewing angle. The detailed 
geometrical configuration of d ust around th e AGN 
is still a matter of debate ("Stalevs ki et al.l 120111 ). 
Observations are not able to resolve the inner part 
of the objects. Theoretical consid erations favor 

a toro idal and clumpy geom e trv (iNenkoya et aLl 

200l [Sciiartmann et alT [2005t iHqnig et al.l 120061: 



Nenkova et al.ll2008l: iSchartmann et al.ll2008ll20ldl ). Ob 
servational hints of a clumpy structure in which optically 
thick dust clouds surround the accretion disc of the cen- 
tral black hole are provided by the lO/zm silicate band. 
This band is detected in absorption and emission. The 
strength of the feature is much w eaker than predicted 
by homogeneous torus models (iPier fc KrolikI 119921: 



Granato & Danese 1994; Efstathiou & Rowan-Robinson' 
1995: van Bcmmcl & DuUcmond 2003). A silicate 
emission feature is also seen in obs cured AGN, where 
the broad em ission lines are hidden (jSturm et al.ll200"6l : 
iMason et al.l [2009). Observations are explained by 
postulating a clumpy torus structure. 
For the dust around AGN we consider fluffy agglom- 



erates of silicate and carbon as sub-particles. We use 
a power law size distribution: n{a) ex a^'^'^ with par- 
ticle radii between a_ < a < a-|- and a_ = 160A , 
a+ ~ 0.13/im. The bulk density of both materials is 
2.5g/cm'^. Dust abundances (ppm) are 31 [Si]/[H] and 
200 [C]/ [H], which agree wi th cosmic abundance con- 
straints (jAsplund et al.l [20091) . This gives a gas-to-dust 
mass ratio of 130. We take the relative volume frac- 
tions in composite grain to be 34% silicates, 16% car- 
bon and 50% vacuum, which translates, with abundances 
as above, to a relative mass fraction of 68% silicates 
and 32% amorphous carbon. Absorption and scatter- 
ing cross-sections and the scattering asymmetry fac- 
tor is computed with Mie theory and the Bruggeman 
mixing rule. Optical constants for silicates are taken 
from Drainc ( 2003) and fo r carb on we use the ACH2 
mixture from iZubko et al.l (|1996( ). This gives a total 
mass extinction cross section in the optical (0.55/im) of 
i^|f * = 350 00. The wavelength dependenc e of K"""* is 
displayed in iKruegel &: SiebenmorgenI (|1994D . 

The hard (UV/optical) AGN radiation emerges from 
the accretion disc around the massive black hole. Its 
spectral shape is sugges ted to follow a broken power law 
(|Rowan-Robinson|[l995l ) : 
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< O.Ol^m 
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O.OK 
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< O.lO/im 
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0.1 < 
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< l.Ojjm 


A-3 for 


1.0 < 


A 


< +00 



(20) 



In the following we consider a clumpy AGN dust torus 
of total luminosity of I/45 = lO'*^ erg/s. The inner ra- 
dius of the torus r-m is set by dust evaporation, which 
is approximated by r-m = 0.4 (_L45)"'^ pc. A clump is 
represented as a sphere of constant density and radius 
of 0.5 pc. The optical depth through the center of the 
clump is Ty = 30. The clumps are randomly distributed 
within a half opening angle oi = 45° between an inner 
radius of nn = 0.4 pc and outer radius of Tout = 6pc. 
For random numbers Ci ; C2 , Cs the position of the center 
of the clouds is computed by 



' = Ci (?'out - 
) = 2^C2 

'4 + (2C3 



(21) 



1)^1 



If clumps overlap the density is constant and the same 
as for a single clump. 

6.1. Influence on clumps on SED 

The AGN torus model is used to study the influence of 
clumps on the SED. In the models the number of clouds 
is varied using A^ciump — 500, 1000 and 2000. This gives 
a total optical depth along the midplane of the torus of 
TV ~ 80, 140 and 200. The SED is displayed in Fig. H 
at wavelengths < 1 /im, the flux is emitted by the central 
source and between 1/im and 50/im the SED is domi- 
nated by dust emission. The dust temperatures are be- 
tween 100 and 1500 K. At short wavelengths of the SED 
the AGN is visible only in the face-on view and becomes 
faint at higher inclination angles. At longer wavelengths 
the flux is stronger in the edge-on view as compared to 
observations at smaller inclination angles. The optical 
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Figure 5. Images at lOfira (top) and 0.55/im (bottom) of the clumpy AGN torus with A^clump = 1000 and viewing angles (from left to 
right) of 0°, 45° and 90°. . Color bar gives flux in (Jy/arcsec ). The source with luminosity L=10 erg/s is at a distance of 15Mpc 



depth of a single clump is high and already one or two 
of such intervening clouds define most of the spectral be- 
havior of the torus. The SEDs at viewing angles between 
45° and 85° are similar. At these angles a few clouds are 
always in the line of sight and dominate the spectrum 
(Fig.©. ... 

Depending on the viewing angle and the total optical 
depth and dust temperatures in the AGN, the 10 /xm sili- 
cate band is observed either in emission or in absorption. 
In AGN viewed face-on (type I sources) there is a direct 
view on the inner region of the tor us and a weak lOjum 
silica t e emission feature is observed (iSiebenmorgen et alj 
[2005 IHao et~all [20051 ISturm et al.ll2005D . In edee-on 
sources (type II) the 10/xm silicate band is seen in ab- 
sorption. 

The silicate feature trace the amount of dust for a 
given viewing direction and constr aints the distribution 
of the clumps in the torus (N ikutta et al.l [2009') . Ho- 
mogeneous AGN torus models without a clumpy struc- 
ture over-predict the strength of th e silicate feature 
(jEfstathiou fc Rowan-RobinsonI 119951 ) . In the clumpy 
torus models presented in Fig. |4| the 10/im feature is 
in emission for face-on views as long as A^ciump ^ 1000. 
Individual clouds are optically thick at 10/ini so that 



a single cloud already obscures the warm and optical 
thin region required to observe the silicate band in emis- 
sion. This is the case in the face-on view of the model 
with 2000 clumps where the emission and self-absorption 
of the silicate grains cancels out and the spectrum be- 
comes featureless. For inclinati on angles 6 > 45° t he sil- 
icate features is in absorption (jRoche et al.lll99l[) . The 
strength of the absorption feature depends on the optical 
depth and therefore increases with increasing number of 
clumps. 

6.2. AGN images 

The MC combined with the ray-tracer (Sect. [4]) allows 
computing of images of the object at a given wavelength. 
In Fig. [5] we present a lO/im and 0.55//m image of the 
clumpy AGN torus using iVdump = 1000. The mid IR 
image is dominated by dust emission and the optical im- 
age by scattered light. The face-on image in the mid IR 
provides an unobscured view of the emission from hot 
dust in the inner torus close to the AGN. In this image, 
clouds near the center dominate the emission. Further 
out, clumps become cooler and the contribution to the 
emission decreases. In the tilted view, at 45°, interven- 
ing clumps obscure part of the emission from the cen- 
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Figure 6. IR emission of the quasar 3C249.1. Photometric data 
by NED and Spitzer spectrum by Siebcnmorgcn ct al. (2005). The 
total flux (full line) is given as sum of the clumpy AGN torus model 
(dashed) and a cirrus component (dashed dotted). 

tral region and the contribution of cooler dust becomes 
more important than in the face-on view. In the edge- 
on view most of the central region is no longer visible 
and the emission is dominated by clouds located close to 
the border of the torus. However, close to the edge-on 
view and because of the clumpy nature of the medium 
there are still a few lines of sight penetrating to the in- 
ner torus. This is not the case in AGN models using a 
homogeneous dust distribution. For all viewing angles 
and clump configurations the scattered light images ap- 
pear similar to the emission images (Fig. [5]). However, 
the flux in the optical is two orders of magnitude fainter 
than at lO^m. The scattering light image in the edge-on 
view becomes fuzzy. The assumption of isotropic scatter- 
ing likely over predicts the scattered light in the face-on 
direction. However, the clumpy structure visible in the 
optical image is preserved. 

6.3. Quasar 3C 249.1 

The clumpy AGN dust torus model is applied to fit 
the SED of the quasar 3C249.1. The luminosity of the 
object is 7 X 10'*^erg/s at redshift of 2; = 0.311 which 
translates, using standard cosmological parameters, to a 
luminosity distance of 1600 Mpc. The torus inner edge is 
set by the sublimation temperature and leads to an inner 
radius of r^^ ~ 1 pc. In the inner region of the model at 
r < 25 X Tin the half opening angle of the torus is 9i = 25° 
and in the outer region up to rout = 50 x Tin it is 55°. The 
torus includes 5500 clouds. The number of clumps per 
volume element decreases in the inner part and increases 
in the outer part of the torus; other parameters remain 
the same. 

In order to fit data at A > 40/im we increased the 
outer radius of the torus rout and considered a two phase 
medium in which the density in between the clumps is 
varied. However, the far IR emission of the quasar cannot 
be explained by the AGN torus. The far IR is dominated 
by emission of cold dust located furtherout in the host 
galaxy. For simplicity we approximate this cirrus com- 
ponent by a modified black body, Kf,'"^B^{T) at 50 K. 
The dust torus model fits the silicate emission feature 
as observed in the Spitzer spectrum and together with 



the cirrus component a good fit to the overall SED is 
achieved (Fig. [6]). 

7. CONCLUSIONS 

A parallel 3D MC method is presented to solve the 
radiative transfer problem of dust obscured objects in 
arbitrary geometries. The code uses an adaptive three 
dimensional Cartesian grid. It utilizes the advantage of 
different optimization algorithms: in optical thin cells the 
Lucy algorithm and in cells at very hi gh optical depths 
a mod ified random walk procedure by i Fleck fc Canfieldl 
(|1984l) is included. The temperature of the dust is com- 
put ed in an iterative MC met hod or with the iteration- 
free iBiorkman fc WoodI ()200lD algorithm. 

The spectral energy distribution of the objects is de- 
rived by a photon counting procedure or a ray-tracing 
routine. Photons which eventually escape the MC model 
space are counted and converted into a fiux density. This 
method computes the SED of an object very quickly 
when a large aperture of several degree is used. Unfortu- 
nately, for high spatial resolution the method is starved 
by the low count rate. The SED of an object observed 
with a pencil beam is computed by a ray tracer which 
uses the dust scattering and absorption events of the MC 
cells as input. The ray tracer allows computing of noise 
free images of the scattered and emitted radiation at any 
frequency. 

MC schemes are known to be computing time expen- 
sive. Therefore we apply particular attention to solve this 
problem. We take advantage of the independence of the 
trajectories of each photon packet. This allows a highly 
vectorized design of the MC program which is not real- 
ized in previous work. With the vectorized code a speed 
improvement of about 100 is reached on a low budget 
computer when compared to a scalar, non-parallel ver- 
sion of the same program. This gain in the speed perfor- 
mance of the computations is reached by applying either 
a multi-core application using conventional central pro- 
cessing units (CPU) or the recent technological develop- 
ment of graphics power units (GPU). The parallelization 
capabilities of various MC radiative transfer algorithms 
are analyzed. By combining different MC algorithms we 
report a linear scaling of the computing time with the 
number of available threads (processor cores). The ray- 
tracer to compute SED and images is another computer 
time expensive procedure and is developed, similar to the 
MC code, in vectorized form. 

The 3D MC is tested against the ray tracer solution of 
the rad i ative transfer problem in spherical symmetry by 
iKriigell ()2008[ ). Further we verified our procedure against 
one and two dimensional benchmarks of the literature. 
The code reproduces the spectral energy and dust tem- 
perature distributions of the test cases at high accuracy. 
As a first astrophysical application we use the MC code 
to investigate the appearance of a dusty AGN at opti- 
cal and IR wavelengths; a second application on proto- 
planetary disks is presented in a n accompanying paper 
(jSiebenmorgen &: Hevmannll2011l ) 

The dust around the AGN is geometrically configured 
in a clumpy torus structure and is represented by fiuffy 
composite grains of various sizes made of silicate and 
carbon. The influence of the number of clumps in the 
torus on the SED is studied. We flnd that the clumpy 
torus explains the observed "weak" silicate absorption 
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and emission feature observed in AGN. Images of the 
AGN in the optical and the mid IR are presented by 
viewing the torus from different angles. The spectral 
energy distribution of the radio loud quasar 3C249.1 is 
fit by the torus model including a cirrus component for 
the far IR and submillimeter emission. 
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